Data Collection and Language Technologies for Mapudungun

نویسندگان

  • Lori Levin
  • Rodolfo Vega
  • Alon Lavie
  • Eliseo Cañulef
  • Carolina Huenchullan
چکیده

Mapudungun is spoken by over 900,000 people (Mapuche) in Chile and Argentina. Thanks to an active bilingual and multicultural education program, Mapuche children are now being taught to be literate in both Mapudungun and Spanish. The Chilean Ministry of Education has teamed up with the Language Technologies Institute’s AVENUE project to collect data and produce language technologies that support bilingual education. The main resource that has come out of the Mineduc-LTI partnership is Mapudungun-Spanish parallel corpus consisting of approximately 200,000 words of text and 120 hours of transcribed speech. Plans are being made for machine translation and computer-assisted instruction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In Proceedings of LREC-2002 Workshop Data Collection and Language Technologies for Mapudungun

Mapudungun is spoken by over 900,000 people (Mapuche) in Chile and Argentina. Thanks to an active bilingual and multicultural education program, Mapuche children are now being taught to be literate in both Mapudungun and Spanish. The Chilean Ministry of Education has teamed up with the Language Technologies Institute’s AVENUE project to collect data and produce language technologies that suppor...

متن کامل

Data Collection and Analysis of Mapudungun Morphology for Spelling Correction

This paper describes part of a three year collaboration between Carnegie Mellon University's Language Technologies Institute, the Programa de Educación Intercultural Bilingüe of the Chilean Ministry of Education, and Universidad de La Frontera (Temuco, Chile). We are currently constructing a spelling checker for Mapudungun, a polysynthetic language spoken by the Mapuche people in Chile and Arge...

متن کامل

Building NLP Systems for Two Resource-Scarce Indigenous Languages: Mapudungun and Quechua

By adopting a “first-things-first” approach we overcome a number of challenges inherent in developing NLP Systems for resourcescarce languages. By first gathering the necessary corpora and lexicons we are then enabled to build, for Mapudungun, a spellingcorrector, morphological analyzer, and two Mapudungun-Spanish machine translation systems; and for Quechua, a morphological analyzer as well as...

متن کامل

Native and Non-native Perception of Stress in Mapudungun: Assessing Structural Maintenance in the Phonology of an Endangered Language.

Today, virtually all speakers of Mapudungun (formerly Araucanian), an endangered language of Chile and Argentina, are bilingual in Spanish. As a result, the firmness of native speaker intuitions-especially regarding perceptually complex issues such as word-stress-has been called into question. Even though native intuitions are unavoidable in the investigation of stress position, efforts can be ...

متن کامل

Acoustic properties of the dental vs. alveolar contrast in Mapudungun

In this paper we undertake an acoustic analysis of dental and alveolar segments in Mapudungun, an indigenous language of Chile. We calculate locus equations for dental and alveolar segment pairs of different manners. We find that dentals differ from alveolars of the corresponding manner in lowering the onset F2 of following vowels. We validate these results by means of a linear mixed model anal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002